Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains
نویسندگان
چکیده
Substantial work has investigated balancing exploration and exploitation, but relatively little has addressed this tradeoff in the context of coordinated multi-agent interactions. This paper introduces a class of problems in which agents must maximize their on-line reward, a decomposable function dependent on pairs of agent’s decisions. Unlike previous work, agents must both learn the reward function and exploit it on-line, critical properties for a class of physicallymotivated systems, such as mobile wireless networks. This paper introduces algorithms motivated by the Distributed Constraint Optimization Problem framework and demonstrates when, and at what cost, increasing agents’ coordination can improve the global reward on such problems.
منابع مشابه
DCOPs and bandits: exploration and exploitation in decentralised coordination
Real life coordination problems are characterised by stochasticity and a lack of a priori knowledge about the interactions between agents. However, decentralised constraint optimisation problems (DCOPs), a widely adopted framework for modelling decentralised coordination tasks, assumes perfect knowledge of these factors, thus limiting its practical applicability. To address this shortcoming, we...
متن کاملBalance Within and Across Domains: The Performance Implications of Exploration and Exploitation in Alliances
Organizational research advocates that firms balance exploration and exploitation, yet it acknowledges inherent challenges in reconciling these opposing activities. To overcome these challenges, such research suggests that firms establish organizational separation between exploring and exploiting units or engage in temporal separation whereby they oscillate between exploration and exploitation ...
متن کاملBalancing Exploration and Exploitation in Alliance Formation
Do firms balance exploration and exploitation in their alliance formation decisions and, if so, why and how? We argue that absorptive capacity and organizational inertia impose conflicting pressures for exploration and exploitation with respect to the value chain function of alliances, the attributes of partners, and partners’ network positions. Although path dependencies reinforce either explo...
متن کاملDecentralized multi-agent reinforcement learning in average-reward dynamic DCOPs
Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in oth...
متن کاملAn Improved Bat Algorithm with Grey Wolf Optimizer for Solving Continuous Optimization Problems
Metaheuristic algorithms are used to solve NP-hard optimization problems. These algorithms have two main components, i.e. exploration and exploitation, and try to strike a balance between exploration and exploitation to achieve the best possible near-optimal solution. The bat algorithm is one of the metaheuristic algorithms with poor exploration and exploitation. In this paper, exploration and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009